Usage: Heartbeat should not schedule usage job when a job is already running#12616
Usage: Heartbeat should not schedule usage job when a job is already running#12616DaanHoogland merged 4 commits intoapache:4.20from
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.20 #12616 +/- ##
============================================
- Coverage 16.26% 16.25% -0.01%
+ Complexity 13429 13423 -6
============================================
Files 5661 5662 +1
Lines 500010 500148 +138
Branches 60715 60729 +14
============================================
- Hits 81331 81310 -21
- Misses 409606 409753 +147
- Partials 9073 9085 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@blueorangutan package |
|
@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16763 |
|
@blueorangutan package |
|
@abh1sar a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16789 |
|
@blueorangutan test |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-15433)
|
|
@blueorangutan package |
|
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16808 |
|
@blueorangutan test |
|
@abh1sar a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian Build Failed (tid-15444) |
|
@blueorangutan test keepEnv |
|
@abh1sar a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
RosiKyu
left a comment
There was a problem hiding this comment.
LGTM
TC: Usage Heartbeat Does Not Schedule Duplicate Jobs When a Job Is Already Running
Objective
Verify that PR #12616 fix prevents the usage heartbeat from scheduling duplicate
jobs when usage is behind in processing, following the developer's test methodology.
Test Steps
- Stop Usage server
- Set
usage.stats.job.aggregation.rangeto 5 - Start Usage server - wait for a usage job to finish
- Stop Usage server
- Execute VM, volume, network creates, acquire IP addresses using cmk to generate events
- Start Usage server after events are generated
- Observe that no failed usage jobs are present in the database
Expected Result:
All usage jobs complete successfully with no failed or duplicate jobs in the database.
Usage records are generated correctly for all resources.
Actual Result: PASS
All jobs completed successfully (success=1). Zero failed jobs. Usage records
grew from 1038 to 1208. Jobs are sequential with no overlaps or duplicates.
Test Evidence:
Step 1: Stop Usage server
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# systemctl stop cloudstack-usage
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# systemctl status cloudstack-usage
× cloudstack-usage.service - CloudStack Usage Server
Loaded: loaded (/usr/lib/systemd/system/cloudstack-usage.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2026-02-13 09:19:16 UTC; 4s ago
Step 2: Set aggregation range to 5
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# cloudmonkey update configuration name=usage.stats.job.aggregation.range value=5
{
"configuration": {
"category": "Usage",
"defaultvalue": "1440",
"description": "The range of time for aggregating the user statistics specified in minutes",
"name": "usage.stats.job.aggregation.range",
"value": "5"
}
}
Step 3: Start Usage server, wait for job to finish
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# cloudmonkey update configuration name=usage.stats.job.exec.time value=09:22
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# systemctl start cloudstack-usage
mysql> SELECT id, start_millis, end_millis, success FROM cloud_usage.usage_job ORDER BY id DESC LIMIT 5;
+----+---------------+---------------+---------+
| id | start_millis | end_millis | success |
+----+---------------+---------------+---------+
| 74 | 0 | 0 | NULL |
| 73 | 1770974220000 | 1770974519999 | 1 |
| 71 | 1770973620000 | 1770974219999 | 1 |
| 70 | 1770973020000 | 1770973619999 | 1 |
| 69 | 1770972420000 | 1770973019999 | 1 |
+----+---------------+---------------+---------+
5 rows in set (0.00 sec)
Step 4: Stop Usage server
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# systemctl stop cloudstack-usage
Step 5: Generate events via cmk
Deployed 20 VMs:
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# for i in $(seq 7 20); do
cloudmonkey deploy virtualmachine serviceofferingid=93beb720-3d80-419c-bc66-27b67ee6b616 \
zoneid=34f66052-b117-45f9-8717-be13ce0b65fd \
templateid=fad22f8e-0818-11f1-9d42-1e0093000162 name=usagetest-vm-$i
echo "Submitted VM $i"
done
Submitted VM 7
Submitted VM 8
...
Submitted VM 20
Created 10 volumes:
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# for i in $(seq 1 10); do
cloudmonkey create volume name=usagetest-vol-$i \
zoneid=34f66052-b117-45f9-8717-be13ce0b65fd \
diskofferingid=$(cloudmonkey list diskofferings filter=id | grep -m1 '"id"' | awk -F'"' '{print $4}')
echo "Created volume $i"
done
Created volume 1
...
Created volume 10
Acquired 5 public IPs:
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# for i in $(seq 1 5); do
cloudmonkey associate ipaddress zoneid=34f66052-b117-45f9-8717-be13ce0b65fd
echo "Acquired IP $i"
done
Acquired IP 1
...
Acquired IP 5
Verified unprocessed events:
mysql> SELECT COUNT(*), processed FROM cloud.usage_event GROUP BY processed;
+----------+-----------+
| COUNT(*) | processed |
+----------+-----------+
| 89 | 0 |
+----------+-----------+
1 row in set (0.00 sec)
Step 6: Start Usage server
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# cloudmonkey update configuration name=usage.stats.job.exec.time value=09:28
[root@ref-trl-10994-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# systemctl start cloudstack-usage
Step 7: Verify no failed jobs
mysql> SELECT id, start_millis, end_millis, success FROM cloud_usage.usage_job ORDER BY id DESC LIMIT 10;
+----+---------------+---------------+---------+
| id | start_millis | end_millis | success |
+----+---------------+---------------+---------+
| 76 | 0 | 0 | NULL |
| 75 | 1770974520000 | 1770974879999 | 1 |
| 73 | 1770974220000 | 1770974519999 | 1 |
| 71 | 1770973620000 | 1770974219999 | 1 |
| 70 | 1770973020000 | 1770973619999 | 1 |
| 69 | 1770972420000 | 1770973019999 | 1 |
| 68 | 1770971820000 | 1770972419999 | 1 |
| 67 | 1770971220000 | 1770971819999 | 1 |
| 65 | 1770968520000 | 1770971219999 | 1 |
| 61 | 1770967920000 | 1770968519999 | 1 |
+----+---------------+---------------+---------+
10 rows in set (0.00 sec)
mysql> SELECT COUNT(*) FROM cloud_usage.usage_job WHERE success = 0;
+----------+
| COUNT(*) |
+----------+
| 0 |
+----------+
1 row in set (0.00 sec)
mysql> SELECT COUNT(*) FROM cloud_usage.cloud_usage;
+----------+
| COUNT(*) |
+----------+
| 1208 |
+----------+
1 row in set (0.01 sec)
Result: Zero failed jobs. All jobs sequential and successful. 1208 usage records
generated correctly for 23 VMs, 10 volumes, and 5 public IPs.
|
[SF] Trillian Build Failed (tid-15455) |
Description
This PR fixes #12424
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
usage.stats.job.aggregation.rangeto 5How did you try to break this feature and the system with this change?